analyzing information flow
Analyzing Information Flow in Transformers - Naver Labs Europe
Seminars at NAVER LABS Europe are open to the public but space is limited. Abstract: We will discuss what, how and why Transformers learn by analyzing 1. the mechanisms the model uses to encode different kinds of information; 2. how training objective defines information flow in a model. First, we will start with an in-depth analysis of multi-head attention. Using attribution methods, we will assess the importance of individual heads and will show that the most important heads play interpretable roles. Surprisingly, all the rest of the heads are redundant and, using our novel heads-pruning method, can be pruned with almost no loss in translation quality.